Nebius
Solutions Architect · Onboarding Program
Portfolio by Rus Teston
SA Enablement · Structured Onboarding

90-Day SA
Onboarding Blueprint

A week-by-week program designed to take a new Nebius Solutions Architect from Day 1 orientation through certified customer-ready status — with technical depth, measurable milestones, and a built-in manager coaching guide at every phase.

12 Weeks total
3 Phases
3 Cert gates
~52h Seat time
Program Timeline · Click any week to jump to that week's detail card
Phase 01 Foundation Weeks 1–2 · ~12 hrs Platform orientation, product portfolio, people and systems access. By end of Phase 1, the SA can explain Nebius AI Cloud to a colleague with confidence.
Phase 02 Practitioner Weeks 3–6 · ~24 hrs Deep technical build — GPU clusters, orchestration, storage, MLOps, competitive positioning, and TCO modeling. By end of Phase 2, the SA can architect and defend a Nebius solution independently.
Phase 03 Expert Weeks 7–12 · ~16 hrs Live deal involvement, customer-facing demos, Token Factory inference architecture, capstone design, and final certification. By end of Phase 3, the SA is independently customer-ready.
Wk 1
01
Orientation & access
Wk 2
02
Platform overview
Wk 3
03
GPU deep dive
Wk 4
04
Orchestration & storage
Wk 5
05
MLOps & competitive
Wk 6
06
TCO & gate 2
Wk 7
07
Token Factory
Wk 8
08
First customer call
Wk 9
09
Demo mastery
Wk 10
10
Solution design
Wk 11
11
Capstone build
Wk 12
12
Certification
View
Week-by-Week Program · Click any card to expand
Week 01 Orientation, Access & the Nebius Story Foundation
Key Activities
Complete HR onboarding, systems access, and equipment setup — Nebius console, GitHub, Slack, email, Okta SSO
Day 1 meeting with SA Manager — review 90-day program structure, expectations, and success criteria
Complete Nebius Company Story module — history, mission, Nasdaq listing, $700M raise, NVIDIA Reference Platform Partner status
First read-through of AI Cloud and Token Factory product pages at nebius.com
Meet your assigned buddy SA — schedule two job-shadow sessions for Weeks 2–3
People to Meet
SA Manager — 90-day program review and expectations alignment
Buddy SA — assigned peer for technical mentorship throughout the program
HR Onboarding — systems, benefits, and compliance training
IT / DevOps — Nebius console access, sandbox environment provisioning
Deliverables Due This Week
Nebius console access confirmed GitHub + Slack onboarding complete 90-day program schedule agreed with manager Buddy SA shadow sessions booked
👤
Manager Action — Week 1
Conduct a Day 1 welcome meeting covering the 90-day program structure, success criteria, and how you'll assess readiness at each gate. Confirm all systems access is provisioned before end of Day 1. Assign buddy SA and brief them on expectations for mentorship.
By end of Week 1, the SA can...
Access all Nebius systems, explain the Nebius company narrative and funding story, and describe the 90-day program structure to a peer.
Week 02 AI Cloud Architecture & Product Portfolio Foundation ✦ Gate 1
Key Activities
Complete AI Cloud Architecture Overview module — full-stack platform walk-through including data center topology (US, Finland, France, Iceland)
Read and study the SemiAnalysis TCO study — understand how Nebius achieved Gold Medal ClusterMAX rating across all three modeled workloads
Product portfolio deep-read: AI Cloud vs Token Factory — when to recommend each, how they complement
Review Trust Center documentation — SOC 2, HIPAA, GDPR, ISO 27001 compliance architecture
Gate 1 Assessment: 30-minute architecture walkthrough with SA Manager — explain the Nebius platform, product portfolio, and data center footprint from memory
People to Meet
SA Lead / Principal SA — architecture overview session and Q&A
Product Marketing — understand how Nebius's positioning is crafted and what materials are available
Buddy SA — first job-shadow session on a real customer call
Deliverables Due This Week
✦ Gate 1: Architecture walkthrough passed SemiAnalysis TCO study notes submitted Product portfolio one-pager (personal notes) First buddy SA job-shadow completed
👤
Manager Action — Week 2 (Gate 1)
Conduct the 30-minute Gate 1 architecture walkthrough. Evaluate: can the SA explain the Nebius platform unprompted? Do they understand the product portfolio distinction (AI Cloud vs Token Factory)? Can they articulate the data center footprint? Pass/fail with written feedback. If passing, advance to Phase 2 on schedule. If not, add a targeted remediation session before Week 3 begins.
By end of Week 2, the SA can...
Deliver a coherent, unprompted walk-through of Nebius AI Cloud architecture, the product portfolio, and the compliance posture — without referring to notes. Gate 1 passed.
Week 03 NVIDIA GPU Portfolio & Hands-On Provisioning Practitioner
Key Activities
GPU tier deep-dive: H100, H200, HGX B200, HGX B300, GB200 NVL72, GB300 NVL72 — architecture differences, memory specs, and use case fit
Hopper vs Blackwell architecture comparison — when to recommend which generation and why
Hands-on Lab 1: Provision an HGX H100 instance via Nebius console, then via CLI, then via Terraform — document the differences in provisioning time and configuration options
Run a basic CUDA workload and benchmark GPU utilization — build a personal reference for what MFU looks like in practice
Second buddy SA job-shadow — focus on how the buddy handles GPU-related customer questions in real calls
People to Meet
Infrastructure Engineering Lead — GPU architecture context and cluster design principles
Buddy SA — second job-shadow session; debrief on GPU positioning in customer conversations
Deliverables Due This Week
Lab 1 completed: H100 provisioned via console, CLI, and Terraform GPU tier comparison cheat sheet (personal reference) MFU benchmark results documented
👤
Manager Action — Week 3
Review the SA's GPU tier comparison cheat sheet — are the use case descriptions accurate and customer-friendly? Is the SA distinguishing Hopper from Blackwell correctly? Check in at end of week: is the SA comfortable in the Nebius console independently? Note any gaps for focused remediation in Week 4.
By end of Week 3, the SA can...
Match any customer workload description to the correct Nebius GPU tier without hesitation, provision a GPU instance via three different methods, and explain MFU in customer-friendly language.
Week 04 Orchestration — Managed Kubernetes & Slurm (Soperator) Practitioner
Key Activities
Managed Kubernetes deep-dive — topology-aware scheduling, node health monitoring, auto-repair for fault-tolerant training
Soperator (Slurm) architecture — when customers choose Slurm over Kubernetes and why. HPC workload patterns vs ML training patterns
Hands-on Lab 2: Deploy a multi-node Kubernetes cluster on Nebius — configure a training job with topology-aware scheduling, simulate a node failure, and observe auto-repair behavior
InfiniBand networking module — NVIDIA Quantum-X800 InfiniBand fabric, non-blocking architecture, and what it enables for distributed training at scale
High-performance storage architecture — up to 1 TB/s read throughput for shared filesystems, 2 GB/s per GPU for object storage. WEKA and VAST Data integrations.
People to Meet
Platform Engineering — Kubernetes team — cluster architecture session and lab guidance
Storage engineering lead — storage architecture walk-through and customer scenario discussion
Deliverables Due This Week
Lab 2: Multi-node K8s cluster deployed and job scheduled Node failure recovery documented and timed InfiniBand vs standard networking one-liner (personal notes)
👤
Manager Action — Week 4
Ask the SA to walk you through the K8s lab results as if presenting to a customer CTO. Can they explain fault-tolerant training in plain language? Do they understand when to recommend Kubernetes vs Slurm? This is the most technically demanding week — flag any major gaps immediately to allow mid-program adjustment.
By end of Week 4, the SA can...
Architect a multi-node training cluster on Nebius, explain fault tolerance to a customer engineering team, and articulate the InfiniBand and storage architecture story without referring to documentation.
Week 05 MLOps Stack, Managed Services & Competitive Positioning Practitioner
Key Activities
Managed MLflow — deploy, configure, and run an experiment tracking session. Understand model registry, artifact storage, and experiment comparison in the Nebius environment
Apache Spark on Nebius Managed Services — data pipeline architecture for ML preprocessing at scale
PostgreSQL managed service — metadata persistence patterns for production ML systems on Nebius
Competitive positioning deep-dive — Nebius vs AWS SageMaker, GCP Vertex AI, Azure ML, and CoreWeave. Differentiation framework: bare-metal MFU, availability speed, AI-native support, TCO.
Study real customer scenarios: Brave Search (inference), Recraft (training), Wubble (fine-tuning) — understand what drove each customer to Nebius and what they built
People to Meet
Product Marketing lead — competitive intelligence briefing and messaging alignment
Customer Success — Brave Search and Recraft account context; what drove the decisions
Sales lead — understand how AEs position Nebius and where SAs typically enter the conversation
Deliverables Due This Week
MLflow experiment deployed and documented Competitive displacement one-pager (personal notes) Customer scenario summary: Brave, Recraft, Wubble
👤
Manager Action — Week 5
Run a mock competitive objection session — present the SA with "we're already on AWS" and evaluate the response. Is the displacement conversation natural and credible? Does the SA use the SemiAnalysis TCO framework as a proof anchor? This is a preview of what Gate 2 will require next week.
By end of Week 5, the SA can...
Deploy and configure the full Nebius MLOps stack, handle a competitive displacement conversation against AWS or GCP using the SemiAnalysis TCO framework, and name three Nebius customers with their workload type and business result.
Week 06 TCO Modeling, Solution Design & Practitioner Certification Practitioner ✦ Gate 2
Key Activities
TCO modeling workshop — build a side-by-side cost comparison vs AWS for a realistic LLM pre-training workload using the SemiAnalysis framework. Document assumptions, GPU hours, storage, and networking costs.
Solution design practice — take a sample customer brief (mid-size AI startup, training a 7B parameter model) and produce a written solution design covering GPU selection, cluster configuration, storage architecture, and estimated TCO
Gate 2 Assessment: Live architecture demo to a mock customer panel (SA Manager + one senior SA). Present the solution design for the 7B model training scenario. 45 minutes including Q&A.
Review Gate 2 feedback and document personal development areas for Phase 3
People to Meet
SA Manager + Senior SA — Gate 2 mock customer panel assessors
Finance / RevOps — understand how Nebius pricing is structured and how SA-led TCO models feed into deal commercial structures
Deliverables Due This Week
✦ Gate 2: Architecture demo to mock panel passed Written TCO model: Nebius vs AWS for 7B param training Solution design document (7B model scenario) Phase 3 development focus areas documented
👤
Manager Action — Week 6 (Gate 2)
Conduct the Gate 2 live architecture demo with a senior SA as co-assessor. Use the standardized rubric: (1) Technical accuracy of GPU and cluster design, (2) Clarity of TCO model assumptions, (3) Ability to handle technical Q&A without breaking down. Written feedback within 24 hours. Gate 2 pass unlocks Phase 3 and first supervised customer call in Week 8.
By end of Week 6, the SA can...
Design a complete Nebius solution for a realistic customer scenario, present it to a technical panel, defend architectural choices under questioning, and build a credible TCO model. Gate 2 passed. Nebius Practitioner Badge earned.
Week 07 Token Factory — Production Inference Architecture Expert
Key Activities
Token Factory API deep-dive — model endpoint architecture, vLLM-optimized throughput, and how Nebius has optimized DeepSeek R1 inference for production use
Hands-on Lab 3: Architect a production inference system using Token Factory — configure authentication, invoke a model endpoint, set rate limits, implement monitoring, and measure time-to-first-token
Latency-optimized serving for reasoning models — when to recommend Token Factory (managed) vs self-managed vLLM on AI Cloud compute
Autoscaling inference endpoint design — handling variable production traffic patterns for customer AI applications
Study Brave Search inference architecture — 11M+ AI-generated answers daily at ~100% GPU utilization. How did they achieve it?
People to Meet
Token Factory product team — product road map briefing and inference architecture deep-dive
Brave Search account team — case study debrief on how the inference architecture was designed
Deliverables Due This Week
Lab 3: Production inference endpoint live and monitored Token Factory vs self-managed decision framework (personal notes) Brave Search architecture summary (personal study notes)
👤
Manager Action — Week 7
Ask the SA to demo the Token Factory endpoint they built in Lab 3 as if presenting to a customer CTO. Evaluate: can they explain the Token Factory vs self-managed tradeoff clearly? Are they comfortable with the vLLM/DeepSeek R1 context? Confirm Week 8 customer call is scheduled with a real AE partner.
By end of Week 7, the SA can...
Architect a production inference system on Token Factory, explain the managed vs self-managed inference tradeoff to a customer, and describe how Brave Search achieves ~100% GPU utilization on Nebius.
Week 08 First Live Customer Call — Supervised Observation Expert
Key Activities
First live customer call: Attend a real AE-led customer discovery call as an observer. Do not take the technical lead — observe how the AE qualifies the workload and when/how they introduce the SA
Post-call debrief with the AE partner — what signals did the customer give? What solution would you design? What would you have done differently?
Second live customer call — take the technical lead on a follow-up call with AE present. Answer technical questions, qualify workload depth, and propose next steps
RAG and Agentic Search solutions study — Nebius RAG architecture, embedding model selection, vector storage options, retrieval optimization patterns
Fine-tuning architecture review — QLoRA, PEFT, LoRA patterns on Nebius. When to recommend fine-tuning vs RAG vs full training.
People to Meet
AE partner — assigned Account Executive for live customer call co-sell experience
Customer(s) — first live customer interaction under SA Manager supervision
SA Manager — end-of-week coaching debrief on first customer call performance
Deliverables Due This Week
First customer call completed (observer role) Second customer call completed (technical lead role) Post-call solution design note submitted to manager RAG vs fine-tuning decision framework (personal notes)
👤
Manager Action — Week 8
Attend the second customer call where the SA takes the technical lead role. Evaluate: did they listen before proposing? Did they correctly identify the workload type? Did they position the right Nebius solution? Did they handle a technical question they didn't know the answer to gracefully? Structured coaching debrief within 48 hours of the call.
By end of Week 8, the SA can...
Lead the technical portion of a live customer discovery call, ask qualifying questions, correctly identify the customer's workload type, and propose a Nebius solution — without manager prompting.
Week 09 Demo Mastery — Building & Delivering the Nebius Demo Expert
Key Activities
Build a personal demo environment — a repeatable, customer-ready Nebius AI Cloud demo that can be tailored to training, fine-tuning, or inference scenarios in under 10 minutes of prep
Deliver three internal demo run-throughs — to SA Manager, to a senior SA, and to an AE partner. Incorporate feedback after each run.
Demo disaster recovery practice — practice handling a live failure (cluster not provisioning, latency spike, console error) without breaking the narrative for the customer
Study the Nebius Solution Library on GitHub — understand which Terraform recipes are most commonly used in customer POCs and which are relevant to your territory
Third live customer call — take full SA ownership of the technical conversation. No observer safety net.
People to Meet
AE partner — demo feedback session from a sales perspective: "what landed, what confused the customer"
Senior SA — advanced demo technique session and feedback on narrative structure
Customer — third live call; SA fully independent
Deliverables Due This Week
Personal demo environment built and documented Three internal demo run-throughs completed Demo feedback log maintained (one entry per run) Third live customer call completed independently
👤
Manager Action — Week 9
Watch one of the SA's internal demo run-throughs. Score against three criteria: (1) Technical accuracy — is the demo showing what it claims to show? (2) Narrative clarity — does the customer know why this matters at every step? (3) Disaster recovery — what happened when something went wrong in the run? Share written scoring before the AE partner demo run-through.
By end of Week 9, the SA can...
Deliver a polished, customer-ready Nebius AI Cloud demo tailored to any workload scenario, recover gracefully from a live technical failure, and lead a customer call fully independently without manager observation.
Week 10 Advanced Solution Design — Multi-Workload Customer Scenarios Expert
Key Activities
Complex scenario design practice — work through three advanced customer scenarios: (1) a pharmaceutical company needing HIPAA-compliant AI training in the EU, (2) a media company scaling stable diffusion inference to production, (3) a fintech building an agentic search system with RAG on Nebius
Physical AI and Robotics solution patterns — how Nebius's infrastructure supports simulation workloads and physical AI model training
Enterprise security and compliance deep-dive — tenant-level isolation, IAM architecture, and how to respond to an enterprise security questionnaire using the Trust Center
Prepare capstone brief — review the capstone scenario (provided by SA Manager at start of Week 11) and begin architecture planning
People to Meet
Security / Compliance lead — enterprise security questionnaire walk-through
Senior SA — complex scenario review and architecture feedback
SA Manager — capstone scenario briefing at end of week
Deliverables Due This Week
Three advanced scenario solution designs (written) Enterprise security Q&A using Trust Center completed Capstone scenario brief received and initial architecture sketch begun
👤
Manager Action — Week 10
Review the SA's three advanced scenario solution designs. Are the compliance recommendations accurate for the pharmaceutical EU scenario? Is the stable diffusion inference architecture optimal for the media scenario? Brief the SA on the capstone scenario at end of Week 10 — give them the full week to design before the Week 11 build begins.
By end of Week 10, the SA can...
Design a complete Nebius solution for a regulated-industry customer with EU data residency, handle an enterprise security questionnaire independently, and respond to advanced agentic AI and RAG architecture questions.
Week 11 Capstone Build — Full Solution Design for Realistic RFP Expert
Key Activities
Respond to the capstone RFP scenario (provided Week 10) — a realistic enterprise AI workload covering training, fine-tuning, and production inference requirements across multiple use cases
Produce a complete solution design document including: executive summary, architecture diagram, GPU selection rationale with Hopper vs Blackwell recommendation, cluster configuration, storage design, MLOps stack recommendation, and full TCO model vs AWS
Prepare the 30-minute customer presentation — slides, live demo environment, and anticipated Q&A responses. Rehearse twice before Week 12 delivery
Internal rehearsal with AE partner — simulate the full presentation including objections and competitive challenge from a mock AWS team
People to Meet
AE partner — rehearsal panel for capstone presentation; provide competitive challenge
Buddy SA — peer review of solution design document before final submission
SA Manager — mid-week check-in to ensure capstone is on track
Deliverables Due This Week
Capstone solution design document completed Architecture diagram finalized TCO model: Nebius vs AWS completed 30-minute presentation deck ready for Week 12 Internal rehearsal with AE partner completed
👤
Manager Action — Week 11
Review the capstone solution design document before Week 12 begins — do not wait until the presentation. Flag any technical inaccuracies, missing components, or weak sections so the SA has time to correct before the panel. Confirm the Gate 3 panel (SA Manager + VP Sales or CRO + one Senior SA) is scheduled and briefed on their assessor roles.
By end of Week 11, the SA can...
Produce a complete, presentation-ready solution design document for a complex enterprise AI workload — covering architecture, GPU selection, storage, MLOps, TCO, and compliance — without manager assistance.
Week 12 Expert Certification — Capstone Presentation & SA Graduation Expert ✦ Gate 3
Key Activities
Gate 3 Capstone Presentation: 30-minute solution presentation to the full panel (SA Manager, VP Sales or CRO, senior SA). Present the complete solution design, deliver a live Nebius demo, defend architectural choices under competitive challenge from the panel, and walk through the TCO model.
Panel debrief and structured feedback — written assessor scores across five dimensions: technical accuracy, solution completeness, demo quality, objection handling, and commercial awareness
Certified Nebius SA Expert ceremony — formal graduation, badge issuance, and first solo customer account assignment
90-day retrospective with SA Manager — what worked in the program, what to improve for the next cohort, and the SA's 30-60-90 growth plan for the next quarter
People to Meet
SA Manager — Gate 3 lead assessor and graduation ceremony
VP Sales or CRO — panel member; executive perspective on commercial relevance of the solution
Senior SA — panel member; technical depth assessor
First solo customer — account assigned post-certification; introductory call in Week 13
Deliverables Due This Week
✦ Gate 3: Capstone presentation to panel passed ✦ Certified Nebius SA — Expert Badge issued Written panel feedback received and reviewed 90-day retrospective completed with manager 30-60-90 growth plan for Q2 agreed First solo customer account assigned
👤
Manager Action — Week 12 (Gate 3 + Graduation)
Lead the Gate 3 capstone panel. Score across five dimensions using the standardized rubric. Deliver the certification badge and written assessment within 24 hours of the presentation. Complete the 90-day program retrospective — capture feedback on module quality, pacing, and lab relevance for use in the next SA onboarding cohort. The SA's 30-60-90 growth plan for Q2 should be agreed before end of week.
By end of Week 12, the SA is...
A Certified Nebius SA — Expert. Independently customer-ready. Assigned their first solo account. Capable of architecting, demonstrating, and defending any Nebius AI Cloud solution without manager support.
← Back to Nebius Projects